Introduction

This week I will move onto modeling other chemical exposure data from the DNT60 study. Five chemicals will be randomly selected from the DNT60 set and these exposure data will be modeled separately. I plan on doing one chemical per week, but if I can go at a faster rate I will. Modeling of each of these exposure data will be better inform the ideal structure of our final model. What data transformation will we use? What fixed effects will we use? What random effects? What correlation structure? We will use centered time data? Etc.

Randomly Select Five Chemicals

Padilla DNT60 data is loaded (not shown) and five chemicals are randomly selected from the set of 61.

# Isolate test chemical names from DNT60 set
set.seed(1)
chemicals <- unique( DNT60pmr0_long[DNT60pmr0_long$wllt=="t"]$cpid )
chemicals.to.model <- sample(chemicals, 5, replace=FALSE)
chemicals.to.model
## [1] "Tebuconazole"          "6-propyl-2-thiouracil" "Hexachlorophene (#15)"
## [4] "5,5-diphenylhydandoin" "Fluoxetine"

Hopefully when I knit this document the randomly selected chemicals will remain the same. Otherwise, here are the chemicals that were randomly selected: Tebuconazole, 6-propyl-2-thiouracil, Hexaclorophene, 5,5-Diphenylhdandoin, and Fluoxetine.

Tebuconazole

Isolate and divide Tebuconazole data into Light and Dark. This will not be shown.

Plot raw Light and Dark data for Tebuconazole.

In prior analysis Tebuconazole was found to affect zebrafish startle response, which is visually apparent in the Dark.

Identify Optimal Box-Cox Transformation Parameters

Identify optimal data transformations for both Light and Dark data. It was found that the square-root transformation was optimal for Loperamide data.

# Light Data
to_optimize.L <- as.data.frame(data.L)
to_optimize.L$t <- as.factor(to_optimize.L$t)
to_optimize.L$y <- to_optimize.L$y + 1
bc_params.L <- MASS::boxcox(y ~ t, data = to_optimize.L,
                         lambda = seq(-3, 3, by = 0.25),
                         plotit = FALSE)
i <- which( bc_params.L$y == max(bc_params.L$y) )
lam.L <- bc_params.L$x[i]
lam.L
## [1] -0.75
# Dark Data
to_optimize.D <- as.data.frame(data.D)
to_optimize.D$t <- as.factor(to_optimize.D$t)
to_optimize.D$y <- to_optimize.D$y + 1
bc_params.D <- MASS::boxcox(y ~ t, data = to_optimize.D,
                         lambda = seq(-3, 3, by = 0.25),
                         plotit = FALSE)
i <- which(bc_params.D$y == max(bc_params.D$y))
lam.D <- bc_params.D$x[i]
lam.D
## [1] 0.5

Dark data is once again optimally transformed using square-root power. Light Data is much different. Visualize the log-likelihood function for Light data.

I will use the optimal transformation by Box-Cox, visualize the residuals of the model, and compare to square root transformed data.

Model Light Data

Tebuconazole Light data will be transformed and a hierarchical structure declared.

# Transform data
# Optimal lambda
data.L.lam <- copy(data.L)
data.L.lam[, lamy := (y+1)^(lam.L)]

# Square-root
data.L.sqrt <- copy(data.L)
data.L.sqrt[, sqrty := sqrt(y)]
Light.lam <- groupedData(lamy ~ t | fishID, data = as.data.frame(data.L.lam[!(fishID%in%remove)]), order.groups = FALSE)
Light.sqrt <- groupedData(sqrty ~ t | fishID, data = as.data.frame(data.L.sqrt[!(fishID%in%remove)]), order.groups = FALSE)

Visualize transformed Light data.

Relationship of movement data with time completely changes using the optimal data transformation. Both datasets have curvature at the beginning of the Light that was ill modeled for the Loperamide exposure data. Optimally transformed data has the same sort of trend as the Dark which was well modeled using quadratic polynomials.

Visualize individual fish trends with time for optimal transform.

Data is highly variable. It almost seems that use of a quadratic term for individual models will be unnecessary. This is what was found for the Loperamide data.

Square root transform.

Again behavior is highly variable. Some fish display cubic behavior, some quadratic, etc. There doesn’t seem to be an obvious optimal model.

Begin assessing what random effects will be necessary. Fit quadratic regressions to each fish individually. Assess the correlation of parameters and evaluate the effect of centering the time data. Then evaluate the significance of the regression parameters across the sample.

Lambda transformed correlation.

Square-root transformed correlation.

Degree of correlation seems about the same. Plots appear strange for square-root data due to the presence of one extreme outlier, DNT-82.6.11. Now evaluate the effect of centering time data. Optimal lambda data first.

Square-root data.

Correlation is reduced immensely. It won’t be this apparent when we begin to fit LME models, but still, I am sure it will have an effect.

Evaluate the significance of individual terms. Start with Optimal lambda data.

Square-root data.

Significance of intercept and slope terms is apparent in both cases. The significance of the quadratic term appears questionable. Luckily, this was what was observed for the Loperamide data as well. Indications that the model used for Loperamide Light phase data might be applicable here.

Begin to Fit LME Models

Start with gls model with a correlation structure as before.

gls.lam <- gls(lamy ~ poly(t,degree=2,raw=TRUE),
               data = Light.lam.1,
               method = "ML")
gls1.lam <- update(gls.lam, correlation=corAR1(,form=~t|fishID))
gls2.lam <- update(gls.lam, correlation=corARMA(,form=~t|fishID,p=1,q=1))
gls3.lam <- update(gls.lam, correlation=corARMA(,form=~t|fishID,p=2,q=0))
gls4.lam <- update(gls.lam, correlation=corARMA(,form=~t|fishID,p=2,q=2))

zz <- AIC(gls.lam, gls1.lam, gls2.lam, gls3.lam, gls4.lam)
normAIC(zz)
##          df        AIC
## gls1.lam  5    0.00000
## gls2.lam  6   31.32444
## gls3.lam  6   31.59660
## gls4.lam  8   33.67148
## gls.lam   4 1953.00228

AR(1) structure is found to be best.

gls.sqrt <- gls(sqrty ~ poly(t,degree=2,raw=TRUE),
               data = Light.sqrt.1,
               method = "ML")
gls1.sqrt <- update(gls.sqrt, correlation=corAR1(,form=~t|fishID))
gls2.sqrt <- update(gls.sqrt, correlation=corARMA(,form=~t|fishID,p=1,q=1))
gls3.sqrt <- update(gls.sqrt, correlation=corARMA(,form=~t|fishID,p=2,q=0))
gls4.sqrt <- update(gls.sqrt, correlation=corARMA(,form=~t|fishID,p=2,q=2))

zz <- AIC(gls.sqrt, gls1.sqrt, gls2.sqrt, gls3.sqrt, gls4.sqrt)
normAIC(zz)
##           df        AIC
## gls1.sqrt  5    0.00000
## gls2.sqrt  6   25.85791
## gls3.sqrt  6   25.97330
## gls4.sqrt  8   26.87666
## gls.sqrt   4 2164.62384

With prior square-root transformed zebrafish data we found that the AR(1) structure was ideal as well.

Identify optimal random effects structure.

ctrl <- lmeControl(opt = "optim", niterEM = 200, msTol = 1e-20, msMaxIter = 1000, apVar = FALSE)

lmm1.lam.L <- lme(lamy ~ conc*poly(t,degree=2,raw=TRUE),
            data = Light.lam.1,
            random = reStruct(~1|fishID,REML=FALSE,pdClass="pdSymm"),
            correlation = corAR1(,form=~t|fishID),
            method = "ML",
            control= ctrl)
lmm2.lam.L <- lme(lamy ~ conc*poly(t,degree=2,raw=TRUE),
            data = Light.lam.1,
            random = reStruct(~poly(t,degree=1,raw=TRUE)|fishID,REML=FALSE,pdClass="pdSymm"),
            correlation = corAR1(,form=~t|fishID),
            method = "ML",
            control = ctrl)
lmm3.lam.L <- lme(lamy ~ conc*poly(t,degree=2,raw=TRUE),
            data = Light.lam.1,
            random = reStruct(~poly(t,degree=2,raw=TRUE)|fishID,REML=FALSE,pdClass="pdSymm"),
            correlation = corAR1(,form=~t|fishID),
            method = "ML",
            control = ctrl)
lmm4.lam.L <- lme(lamy ~ conc*poly(t,degree=3,raw=TRUE),
            data = Light.lam.1,
            random = reStruct(~poly(t,degree=3,raw=TRUE)|fishID,REML=FALSE,pdClass="pdSymm"),
            correlation = corAR1(,form=~t|fishID),
            method = "ML",
            control = ctrl)

zz <- AIC(lmm1.lam.L,lmm2.lam.L,lmm3.lam.L,lmm4.lam.L)
normAIC(zz)
##            df       AIC
## lmm3.lam.L 26  0.000000
## lmm4.lam.L 36  1.371738
## lmm2.lam.L 23 10.827887
## lmm1.lam.L 21 19.321880

Best model is found to be lmm3.lam.L which was also found optimal in Loperamide, in terms of random effects. Perform same determination of random effects structure for square-root transformed data.

ctrl <- lmeControl(opt = "optim", niterEM = 200, msTol = 1e-20, msMaxIter = 1000, apVar = FALSE)

lmm1.sqrt.L <- lme(sqrty ~ conc*poly(t,degree=2,raw=TRUE),
            data = Light.sqrt.1,
            random = reStruct(~1|fishID,REML=FALSE,pdClass="pdSymm"),
            correlation = corAR1(,form=~t|fishID),
            method = "ML",
            control = ctrl)
lmm2.sqrt.L <- lme(sqrty ~ conc*poly(t,degree=2,raw=TRUE),
            data = Light.sqrt.1,
            random = reStruct(~poly(t,degree=1,raw=TRUE)|fishID,REML=FALSE,pdClass="pdSymm"),
            correlation = corAR1(,form=~t|fishID),
            method = "ML",
            control = ctrl)
lmm3.sqrt.L <- lme(sqrty ~ conc*poly(t,degree=2,raw=TRUE),
            data = Light.sqrt.1,
            random = reStruct(~poly(t,degree=2,raw=TRUE)|fishID,REML=FALSE,pdClass="pdSymm"),
            correlation = corAR1(,form=~t|fishID),
            method = "ML",
            control = ctrl)
lmm4.sqrt.L <- lme(sqrty ~ conc*poly(t,degree=3,raw=TRUE),
              data = Light.sqrt.1,
              random = reStruct(~poly(t,degree=3,raw=TRUE)|fishID,REML=FALSE,pdClass="pdSymm"),
              correlation = corAR1(,form=~t|fishID),
              method = "ML",
              control = ctrl)

zz <- AIC(lmm1.sqrt.L,lmm2.sqrt.L,lmm3.sqrt.L,lmm4.sqrt.L)
normAIC(zz)
##             df       AIC
## lmm3.sqrt.L 26  0.000000
## lmm4.sqrt.L 36  4.218021
## lmm2.sqrt.L 23 10.729456
## lmm1.sqrt.L 21 27.039105

Model with quadratic random effects term was once again found optimal. In the Loperamide model lmm2 and lmm3 were interchangeable for square-root transformed data. Here we see a more obvious benefit to modeling with a quadratic random effect

Visualize Residuals of Optimal Models

Visualize the residuals of optimal models for both the lambda transformed and square-root transformed data to determine if one transformation-model combination produces more normal looking residuals.

Residuals are pretty bad for bot transformations. For transformed data the subject-level (level=1) residuals look okay, but sample level residuals don’t look so great. Picking a middle-ground transformation may provide better residuals.

Plot residuals against fitted values. lambda then sqrt, level=1.

level=0.

Heteroscedasticity in data is very apparent in both levels, but is worst at the sample level. Due largely to zero values in the Light I am sure. These graphics suggest that using a different transformation may result in better looking residuals.

Try log10.

Evaluate residuals normality after log transformation.

The apparent normality of residuals appears to slightly improve at subject level. Evaluate residuals by fitted values to assess heteroscedasticity.

Heteroscedasticity is still very apparent in sample-level residuals.

I want to point out that the parameters show similar concentration response changes across transformations. In other words, I don’t know how much the transformations will affect the take-aways of our analysis after concentration-response fitting. It seems that adoption of the square-root transform could reduce our sensitivity.

summary(lmm3.lam.L)$tTable
##                                                   Value    Std.Error   DF
## (Intercept)                                5.850010e-01 0.0285735579 2838
## conc0.12                                   4.912319e-03 0.0619847500  144
## conc0.4                                   -9.508171e-03 0.0619847500  144
## conc1.2                                    1.284763e-02 0.0606136697  144
## conc4                                      1.243843e-01 0.0619847500  144
## conc12                                     2.849605e-02 0.0606136697  144
## poly(t, degree = 2, raw = TRUE)1          -1.831941e-02 0.0020551982 2838
## poly(t, degree = 2, raw = TRUE)2           1.607442e-03 0.0003716313 2838
## conc0.12:poly(t, degree = 2, raw = TRUE)1  8.169681e-05 0.0044583508 2838
## conc0.4:poly(t, degree = 2, raw = TRUE)1   2.629974e-03 0.0044583508 2838
## conc1.2:poly(t, degree = 2, raw = TRUE)1   4.590691e-03 0.0043597337 2838
## conc4:poly(t, degree = 2, raw = TRUE)1     8.704640e-03 0.0044583508 2838
## conc12:poly(t, degree = 2, raw = TRUE)1    1.469965e-02 0.0043597337 2838
## conc0.12:poly(t, degree = 2, raw = TRUE)2 -4.206434e-04 0.0008061816 2838
## conc0.4:poly(t, degree = 2, raw = TRUE)2  -1.100242e-04 0.0008061816 2838
## conc1.2:poly(t, degree = 2, raw = TRUE)2  -5.489477e-04 0.0007883491 2838
## conc4:poly(t, degree = 2, raw = TRUE)2    -7.268953e-04 0.0008061816 2838
## conc12:poly(t, degree = 2, raw = TRUE)2    2.556684e-05 0.0007883491 2838
##                                               t-value      p-value
## (Intercept)                               20.47351060 5.304288e-87
## conc0.12                                   0.07925045 9.369434e-01
## conc0.4                                   -0.15339532 8.783013e-01
## conc1.2                                    0.21195921 8.324386e-01
## conc4                                      2.00669238 4.665450e-02
## conc12                                     0.47012587 6.389763e-01
## poly(t, degree = 2, raw = TRUE)1          -8.91369542 8.638320e-19
## poly(t, degree = 2, raw = TRUE)2           4.32536802 1.575317e-05
## conc0.12:poly(t, degree = 2, raw = TRUE)1  0.01832445 9.853813e-01
## conc0.4:poly(t, degree = 2, raw = TRUE)1   0.58989830 5.553058e-01
## conc1.2:poly(t, degree = 2, raw = TRUE)1   1.05297511 2.924420e-01
## conc4:poly(t, degree = 2, raw = TRUE)1     1.95243506 5.098476e-02
## conc12:poly(t, degree = 2, raw = TRUE)1    3.37168515 7.571029e-04
## conc0.12:poly(t, degree = 2, raw = TRUE)2 -0.52177249 6.018695e-01
## conc0.4:poly(t, degree = 2, raw = TRUE)2  -0.13647576 8.914549e-01
## conc1.2:poly(t, degree = 2, raw = TRUE)2  -0.69632564 4.862819e-01
## conc4:poly(t, degree = 2, raw = TRUE)2    -0.90165215 3.673182e-01
## conc12:poly(t, degree = 2, raw = TRUE)2    0.03243085 9.741307e-01
summary(lmm3.sqrt.L)$tTable
##                                                   Value   Std.Error   DF
## (Intercept)                                1.2007417748 0.093333148 2838
## conc0.12                                  -0.0010617059 0.202468025  144
## conc0.4                                    0.0115247848 0.202468025  144
## conc1.2                                    0.0267443323 0.197989505  144
## conc4                                     -0.3376316051 0.202468025  144
## conc12                                    -0.0549902219 0.197989505  144
## poly(t, degree = 2, raw = TRUE)1           0.0546681427 0.006662243 2838
## poly(t, degree = 2, raw = TRUE)2          -0.0043724627 0.001186387 2838
## conc0.12:poly(t, degree = 2, raw = TRUE)1 -0.0053093841 0.014452434 2838
## conc0.4:poly(t, degree = 2, raw = TRUE)1  -0.0086924490 0.014452434 2838
## conc1.2:poly(t, degree = 2, raw = TRUE)1  -0.0137281316 0.014132751 2838
## conc4:poly(t, degree = 2, raw = TRUE)1    -0.0225735801 0.014452434 2838
## conc12:poly(t, degree = 2, raw = TRUE)1   -0.0443570007 0.014132751 2838
## conc0.12:poly(t, degree = 2, raw = TRUE)2  0.0005451126 0.002573636 2838
## conc0.4:poly(t, degree = 2, raw = TRUE)2   0.0013876797 0.002573636 2838
## conc1.2:poly(t, degree = 2, raw = TRUE)2   0.0008845858 0.002516708 2838
## conc4:poly(t, degree = 2, raw = TRUE)2     0.0021043322 0.002573636 2838
## conc12:poly(t, degree = 2, raw = TRUE)2   -0.0007892709 0.002516708 2838
##                                               t-value      p-value
## (Intercept)                               12.86511601 7.428137e-37
## conc0.12                                  -0.00524382 9.958233e-01
## conc0.4                                    0.05692151 9.546866e-01
## conc1.2                                    0.13507955 8.927376e-01
## conc4                                     -1.66757988 9.757244e-02
## conc12                                    -0.27774311 7.816079e-01
## poly(t, degree = 2, raw = TRUE)1           8.20566666 3.437490e-16
## poly(t, degree = 2, raw = TRUE)2          -3.68552699 2.324993e-04
## conc0.12:poly(t, degree = 2, raw = TRUE)1 -0.36736956 7.133708e-01
## conc0.4:poly(t, degree = 2, raw = TRUE)1  -0.60145227 5.475868e-01
## conc1.2:poly(t, degree = 2, raw = TRUE)1  -0.97137009 3.314468e-01
## conc4:poly(t, degree = 2, raw = TRUE)1    -1.56192242 1.184178e-01
## conc12:poly(t, degree = 2, raw = TRUE)1   -3.13859636 1.715019e-03
## conc0.12:poly(t, degree = 2, raw = TRUE)2  0.21180644 8.322733e-01
## conc0.4:poly(t, degree = 2, raw = TRUE)2   0.53919045 5.897977e-01
## conc1.2:poly(t, degree = 2, raw = TRUE)2   0.35148531 7.252504e-01
## conc4:poly(t, degree = 2, raw = TRUE)2     0.81764963 4.136259e-01
## conc12:poly(t, degree = 2, raw = TRUE)2   -0.31361247 7.538384e-01
summary(lmm3.log.L)$tTable
##                                                   Value    Std.Error   DF
## (Intercept)                                3.977517e-01 0.0319960272 2838
## conc0.12                                   3.161728e-05 0.0694091282  144
## conc0.4                                    6.133096e-03 0.0694091282  144
## conc1.2                                    4.742247e-03 0.0678738234  144
## conc4                                     -1.223361e-01 0.0694091282  144
## conc12                                    -3.221931e-02 0.0678738234  144
## poly(t, degree = 2, raw = TRUE)1           1.969623e-02 0.0023074832 2838
## poly(t, degree = 2, raw = TRUE)2          -1.630058e-03 0.0003878003 2838
## conc0.12:poly(t, degree = 2, raw = TRUE)1 -1.431997e-03 0.0050056338 2838
## conc0.4:poly(t, degree = 2, raw = TRUE)1  -3.253435e-03 0.0050056338 2838
## conc1.2:poly(t, degree = 2, raw = TRUE)1  -5.295017e-03 0.0048949110 2838
## conc4:poly(t, degree = 2, raw = TRUE)1    -8.658313e-03 0.0050056338 2838
## conc12:poly(t, degree = 2, raw = TRUE)1   -1.601262e-02 0.0048949110 2838
## conc0.12:poly(t, degree = 2, raw = TRUE)2  1.053422e-04 0.0008412570 2838
## conc0.4:poly(t, degree = 2, raw = TRUE)2   3.089065e-04 0.0008412570 2838
## conc1.2:poly(t, degree = 2, raw = TRUE)2   2.900603e-04 0.0008226487 2838
## conc4:poly(t, degree = 2, raw = TRUE)2     7.471815e-04 0.0008412570 2838
## conc12:poly(t, degree = 2, raw = TRUE)2   -9.200994e-05 0.0008226487 2838
##                                                 t-value      p-value
## (Intercept)                               12.4312855235 1.382469e-34
## conc0.12                                   0.0004555205 9.996372e-01
## conc0.4                                    0.0883615150 9.297121e-01
## conc1.2                                    0.0698685737 9.443952e-01
## conc4                                     -1.7625368888 8.010021e-02
## conc12                                    -0.4746941495 6.357241e-01
## poly(t, degree = 2, raw = TRUE)1           8.5358057009 2.232203e-17
## poly(t, degree = 2, raw = TRUE)2          -4.2033447622 2.711251e-05
## conc0.12:poly(t, degree = 2, raw = TRUE)1 -0.2860771083 7.748399e-01
## conc0.4:poly(t, degree = 2, raw = TRUE)1  -0.6499547232 5.157741e-01
## conc1.2:poly(t, degree = 2, raw = TRUE)1  -1.0817392189 2.794603e-01
## conc4:poly(t, degree = 2, raw = TRUE)1    -1.7297136199 8.379018e-02
## conc12:poly(t, degree = 2, raw = TRUE)1   -3.2712791482 1.083433e-03
## conc0.12:poly(t, degree = 2, raw = TRUE)2  0.1252199681 9.003583e-01
## conc0.4:poly(t, degree = 2, raw = TRUE)2   0.3671962931 7.135000e-01
## conc1.2:poly(t, degree = 2, raw = TRUE)2   0.3525931987 7.244196e-01
## conc4:poly(t, degree = 2, raw = TRUE)2     0.8881726875 3.745231e-01
## conc12:poly(t, degree = 2, raw = TRUE)2   -0.1118459662 9.109534e-01

Take-aways: the transformation that we choose for the data is going to have little to no effect on the heteroscedasticity of residuals vs. fitted values.

Evaluate Square-Root Model in a Few Ways

Fit optimal sqrt model with REML and visualize population level fits.

lmm3.L.REML <- lme(sqrty ~ conc*poly(t,degree=2,raw=TRUE),
                   data = Light.sqrt.1,
                   random = reStruct(~poly(t,degree=2,raw=TRUE)|fishID,REML=TRUE,pdClass="pdSymm"),
                   correlation = corAR1(,form=~t|fishID),
                   method = "REML",
                   control = ctrl)

I don’t think I am making the above confidence intervals correctly.

Plot the residuals for concentration mean values.

I believe we are poorly fitting one of these concentration groups. I can’t think of any immediate way to deal with this. I doubt a different linear model would capture these points well. My current guess is that modeling the heterogeneity of time will be best way to deal with these issues. Hopefully modeling heterogeneic time within subjects will induce a sample level correlation strucutre that takes this into account.

to.plot[(abs(to.plot$stResid) > 2),]
##    conc   t         x     pred        SE    upper     lower      resid
## 58  0.4 7.5 0.9979257 1.396458 0.1812585 1.758975 1.0339411 -0.3985323
## 59  0.4 8.5 0.9248516 1.398504 0.1904875 1.779479 1.0175292 -0.4736526
## 75  1.2 4.5 1.8673364 1.338035 0.1740879 1.686210 0.9898589  0.5293018
## 76  1.2 5.5 1.7761049 1.343961 0.1723221 1.688605 0.9993171  0.4321437
##      stResid
## 58 -2.360998
## 59 -2.806027
## 75  3.135706
## 76  2.560119

Something I just noticed, fitting with REML completely changes the correlation of the random effects, increasing it immensely.

lmm3.sqrt.L$modelStruct$reStruct
## Random effects:
##  Formula: ~poly(t, degree = 2, raw = TRUE) | fishID
##  Structure: General positive-definite
##                                  StdDev      Corr          
## (Intercept)                      0.791063813 (Intr) pd=2r=T
## poly(t, degree = 2, raw = TRUE)1 0.032934649  0.671        
## poly(t, degree = 2, raw = TRUE)2 0.006653397 -0.815 -0.118 
## Residual                         1.000000000
lmm3.L.REML$modelStruct$reStruct
## Random effects:
##  Formula: ~poly(t, degree = 2, raw = TRUE) | fishID
##  Structure: General positive-definite
##                                  StdDev      Corr          
## (Intercept)                      0.735584543 (Intr) pd=2r=T
## poly(t, degree = 2, raw = TRUE)1 0.026852598  1            
## poly(t, degree = 2, raw = TRUE)2 0.004121189 -1     -1     
## Residual                         1.000000000

Plot confidence intervals about fixed parameters to assess if estimator accuracy changes.

No apparent changes in estimator accuracy with fitting with REML.

Evaluate Assumptions of LME Model Using Model Fit with REML

From the residual plots of the data we are obviously breaking many of the underlying assumptions of Linear Mixed Effects model: homoscedasticity of repeated measurements, and normality of subject-level residuals. However, I am unsure how much these assumptions matter for inferences made at the level of the sample. From Davidian’s course I know there is some leeway allowed in regards to the positive-definiteness of random effects Var-Cov matrix, but I don’t know if there other assumptions can be relaxed.

The residuals about fitted values for concentration groups look acceptable. Is this enough?

Either way I am going to evaluate the model as recommended in Mixed Effects Models in S and S-Plus.

Assumption 1: Within group errors are normal and i.i.d., with mean zero.

To assess this we need to evaluate the normality of within group errors, the homoscedasticity or those error by group, the means of those errors, and the within-group correlation of those errors (which we addressed with an AR(1) model).

Evaluate within-group errors means and variance by plotting all within group errors stratified by group (fishID).

Error means appear to be nearly centered at zero, but also appear skewed displayed by the presence of more outliers in the positive x direction than in the negative x-direction. This is probably reflective of the high occurrence of zero-values and some skew that wasn’t adjusted by the power transform.

Variance is not constant for all groups/fish with some fish being highly variable, but could this be attributed in some ways to sampling error?

There seems to be increasing variance as one moves up the y axis. I believe fish on y-axis are order by plate number. Potentially we can attribute this change in variability to plate number, or potentially plate location.

Plot residuals faceted by plate number.

Variability in residuals is fairly consistent by plate. It seems that DNT-79 is particularly tight. It appears that some plates are more active, or are at least fitted with higher values. Refit model using the inclusion of well location variable to evaluate the effect of well location on subject variance and plot.

Here it seems more apparent that well location is having an effect on within group variance. My guess is that this is because exterior wells tend to be less active than interior. This difference between interior and exterior wells was adjusted for in a paper using random effects and fixed effects. Maybe we can replicate that? Another paper included random effects for plate. Maybe we can replicate this as well.

Evaluate heteroscedasticity with time.

Obvious heteroscedasticity, but isn’t that strong. Could be adjusted for with a power function. Centrality of errors at zero is nearly met.

Take-aways: This model isn’t accounting for confounding factors that likely affect mean responses and appear to produce heterogeneity in the underlying data. Specifically plate and location effects, and time effects. There are ways to account for these effect, some of which are addressed in published applications of LME models to zebrafish data (plate and well effects). I’m guessing that the source of heterogeneity that has the largest effect on the normality of our sample level residuals is time. This is can be adjusted for, figuring out how is another story.

Assess Assumptions on Random Effects

Random effect are assumed to normally distributed with mean zero and a covariance matrix that is identical for all groups, and are independent for different groups. Departures from these assumptions are assessed using qq-plots and pair correlation plots to assess marginal normality, identify outliers, and check assumption of homogeneity of variance.

High degree of correlation between random effects estimates is apparent in the qq plots. Random effects are nearly normal with some obvious high outliers.

Where are these outliers occuring?

##                                srcf              acid         cpid   apid rowi
## 1: All Chemicals one per sheet.xlsx ZFpmrALD-20-40-40 Tebuconazole DNT-79    3
## 2: All Chemicals one per sheet.xlsx ZFpmrALD-20-40-40         DMSO DNT-81    8
## 3: All Chemicals one per sheet.xlsx ZFpmrALD-20-40-40 Tebuconazole DNT-81    6
## 4: All Chemicals one per sheet.xlsx ZFpmrALD-20-40-40 Tebuconazole DNT-82    7
## 5: All Chemicals one per sheet.xlsx ZFpmrALD-20-40-40 Tebuconazole DNT-82    4
## 6: All Chemicals one per sheet.xlsx ZFpmrALD-20-40-40 Tebuconazole DNT-83    5
## 7: All Chemicals one per sheet.xlsx ZFpmrALD-20-40-40         DMSO DNT-84    8
##    coli wllt wllq  conc      fishID
## 1:   11    t    1 12.00 DNT-79.3.11
## 2:    9    v    1  0.00  DNT-81.8.9
## 3:    7    t    1  0.40  DNT-81.6.7
## 4:   11    t    1  0.12 DNT-82.7.11
## 5:   11    t    1  4.00 DNT-82.4.11
## 6:    3    t    1  1.20  DNT-83.5.3
## 7:   10    v    1  0.00 DNT-84.8.10

Outliers are not occurring in any one concentration group, which is reassuring. Continuing with the thought, assess the homogeneity of random effects by concentration group.

One might be able to argue that there is some heterogeneity, but it is not occurring in a concentration dependent way. Group 0.4 uM appears to be less variable than other groups. A potential worry here is that heterogeneity in concentration groups can result in random effects estimates being pulled away from their true values to compensate, i.e. fish in 0.4 are being fit with values that are too large.

That is not obvious.

Take-Away: Overall, assumptions on the random effects of the model appear to be adhered too.